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Algorithms for simulating complex physical systems or solving difficult optimization problems 
often resort to an annealing process. Rather than simulating the system at the temperature of 
interest, an annealing algorithm starts at a temperature that is high enough to ensure ergodicity 
and gradually decreases it until the destination temperature is reached. This idea is used in popular 
algorithms such as parallel tempering and simulated annealing. A general problem with annealing 
methods is that they require a temperature schedule. Choosing well-balanced temperature sched¬ 
ules can be tedious and time-consuming. Imbalanced schedules can have a negative impact on the 
convergence, runtime and success of annealing algorithms. This article outlines a unifying frame¬ 
work, ensemble annealing, that combines ideas from simulated annealing, histogram reweighting 
and nested sampling with concepts in thermodynamic control. Ensemble annealing simultaneously 
simulates a physical system and estimates its density of states. The temperatures are lowered not 
according to a prefixed schedule but adaptively so as to maintain a constant relative entropy between 
successive ensembles. After each step on the temperature ladder an estimate of the density of states 
is updated and a new temperature is chosen. Ensemble annealing is highly practical and broadly 
applicable. This is illustrated for various systems including Ising, Potts, and protein models. 
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I. INTRODUCTION 

Simulational science often involves the generation of 
configurations from high-dimensional probability distri¬ 
butions as well as the computation of ensemble averages 
and normalization constants. Numerous applications in 
statistical physics, biomolecular simulation and Bayesian 
inference illustrate the ubiquitous need for efficient sam¬ 
pling methods. Challenges are posed by the complexity 
of the system, its shear size, slow convergence and non- 
ergodicity. 

To address these challenges, algorithms that work with 
modified versions of the system have been proposed. One 
idea is to simulate the system at multiple temperatures 
and utilize the enhanced flexibility at higher tempera¬ 
tures to avoid local free-energy minima at lower tem¬ 
peratures. This idea is the basis of sampling algorithms 
such as replica-exchange Monte Carlo [T] and simulated 
tempering [5] but also used in popular optimization al¬ 
gorithms such as simulated annealing [3]. 

Parallel tempering (PT) [4], for example, considers 
a family of canonical ensembles at different tempera¬ 
tures. The ensembles are simulated independently and 
occasional swaps of configurations between ensembles at 
nearby temperatures allow the simulation to escape from 
metastable states. From a PT simulation thermody¬ 
namic quantities such as free energies and heat capac¬ 
ities can then be computed with high accuracy. But the 
success and convergence of a PT run depends critically 
on the choice of the temperature schedule. To choose 
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a good temperature schedule can be highly non-trivial, 
especially for systems undergoing phase transitions. A 
well-balanced schedule entails overlap between ensembles 
at neighboring temperatures. This means that we have 
to use more and more replicas with increasing system size 
because energy is extensive Moreover, PT explores 
temperature space on a fixed ladder. If we want to use 
multiple temperatures or control parameters as in multi¬ 
dimensional PT we are suffering from the curse of 
dimensionality. Another source of inefficiency is the fact 
that configurations at high temperatures are constantly 
being produced but no longer needed once the simulation 
has converged. 

Multi-canonical sampling algorithms [7] are a power¬ 
ful alternative to annealing methods. Rather than uti¬ 
lizing a temperature parameter to modify the system, 
multi-canonical algorithms draw configurations from an 
ensemble whose weight is inversely proportional to the 
density of states (DOS) of the system, such that ideally 
the energy histogram will be constant. However, this 
requires that we know the DOS before the actual simu¬ 
lation, which is rarely the case. 

The Wang-Landau (WL) algorithm [8] is an ingenious 
variant of multi-canonical sampling that sidesteps this 
problem. The unknown density of states is estimated 
in the course of a WL run, configurational samples are 
generated as a by-product. The fact that the correct 
DOS should produce a flat energy histogram can be used 
to monitor the convergence of the method. By gradually 
decreasing the learning rate, the simulation is stabilized 
and converges. 

WL sampling has originally been developed for discrete 
systems [S]. Its direct extension to large or continuous 
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systems requires choosing an energy range and binning. 
But there might be forbidden energy levels that cannot 
be visited, in which case the corresponding bins remain 
empty and the energy histogram will never be flat. These 
problems are aggravated for multi-dimensional DOS over 
more than one macrovariable because the number of bins 
grows exponentially in the number of macrovariables. In 
that case flatness of the energy histogram ceases to be 
a useful convergence criterion and must be replaced by 
other criteria [9] . To apply these modifications in practice 
remains a challenge and involves parameter tweaking. 

This article proposes an algorithm, ensemble anneal¬ 
ing, that solves these issues and produces samples and 
an estimate of the DOS. Ensemble annealing is inspired 
by the nested sampling method for Bayesian computa¬ 
tion [10] and can be viewed as a generalization of nested 
sampling to the canonical or other ensembles. The algo¬ 
rithm applies both to discrete and continuous systems. 
In contrast to simulated annealing or parallel tempering, 
ensemble annealing constructs an optimal temperature 
protocol adaptively and has only few algorithmic param¬ 
eters. 


II. ENSEMBLE ANNEALING 

Ensemble annealing is a sequential algorithm that 
steps through iterations denoted by k. N non-interacting 
particles or walkers are employed to explore a series of en¬ 
sembles typically starting in a high temperature ensem¬ 
ble, then cycling through ensembles at lower and lower 
temperatures, until the destination ensemble is reached. 
For each ensemble, the walkers produce configurations 
Xkn where n = 1,..., N. In contrast to other annealing 
and tempering methods only the start and final ensem¬ 
ble have to be chosen. The intermediate ensembles are 
found during the simulation by placing them such that a 
constant overlap between successive ensembles is main¬ 
tained. To implement this approach, we need to agree 
on various concepts, mainly what kind of ensembles will 
be considered, and how to measure distances between 
ensembles. 


A. Ensembles 

Let us denote the target ensemble from which we aim 
to generate configurations x by p{x). Often p{x) = 
Tr{x) with energy E{x) = — \np{x). In Bayesian 

inference, for example, 7r(a;) denotes the prior distribu¬ 
tion and E{x) corresponds to negative log likelihood. In 
a physical simulation of particles in a box, 'k{x) will be 
uniform over the box and E will be the interaction en¬ 
ergy between all particles. Note that in practice both 
7r(x) and p{x) are often unnormalized. 

To draw configurations from p{x) we consider a series 


of ensembles 

Pk{x) = — qk[E{x)\ tt{x) (1) 

Ck 

where Ck = f qk[E{x)]TT{x) dx normalizes the fc-th en¬ 
semble. Here, we assume that ensemble pk depends on 
the configuration only through the macrovariable E{x), 
but the method works also for more general ensembles. 
Typically qk{E) = q(E;Xk) where q{E-,X) is a parame¬ 
terized family and A a protocol parameter. The distri¬ 
butions qk{E) > 0 are intermediate helper or bridging 
distributions. In case of the canonical ensemble, config¬ 
urations are weighted by the Boltzmann factor 

qk[E) = exp{-/3feE;} (2) 

where jSk is the inverse temperature and Ck is the parti¬ 
tion function. 

Obviously the canonical ensemble is a widespread 
choice in annealing methods, but it might also be worth¬ 
while to consider other ensembles. For example, in par¬ 
allel tempering the use of the Tsallis ensemble 

^ {l + /3(afc-l)(E-E,„i„)}i/0.-i) 

with control parameter ak > IE], inverse temperature 
P and minimum energy Emin < E{x) has been proposed 
El- A multi-parameter combination of the Boltzmann 
and Tsallis ensemble is used in complex Bayesian data 
analyses m to independently control the prior density 
and the likelihood function. 

Another ensemble that is of potential interest is the 
Fermi distribution 

qkiE) ^ ( 4 ) 

which has two control parameters: the inverse tempera¬ 
ture Pk and an energy cutoff e^. The zero-temperature 
Fermi ensemble approaches a stepfunction, i.e. configu¬ 
rations with energies greater than Ck are assigned zero 
probability: 

qk{E) = Q{ek-E) (5) 

where 0(-) is the Heaviside step function. This ensem¬ 
ble is used in the nested sampling method for Bayesian 
computation [10] and also related to the microcanonical 
ensemble [UlTi]: 

qk{E) = Q{ek-E){ek-Ef/^-^ ( 6 ) 

where d is the dimension of configuration space (number 
of configurational degrees of freedom) and the total 
energy of the system (potential plus kinetic energy). 

Note that the target ensemble p{x) does not neces¬ 
sarily need to be a member of the bridging family, i.e. 
there might be no Pk(x) such that p{x) oc Pk(x), which 
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is the case, for example, in nested sampling and the mi- 
crocanonical ensemble. 

The density of states (DOS) over the prior or reference 
distribution 7r(x) is defined as 



S(E — E(x)) 7r(x) da; 


( 7 ) 


different ensembles might also be useful. For example, 
we could use the exchange rate of a parallel tempering 
simulation 


Ri/3k,Pi) = - 

CkCl 


mm{qk{Ei)qi{E2),qk{E2)qi{Ei)} 

X g{Ei)giE2)dEidE2 (12) 


with 6{-) denoting the delta function. With the help of 
the DOS it is straightforward to compute how the ener¬ 
gies are distributed in the intermediate ensembles: 

Pk{E)= [ 5{E - E{x))pk{x)dx = — g{E)qk{E) (8) 

J Ck 

where pk {E) is a one-dimensional distribution. 


B. Relative entropy 

When choosing the intermediate distributions that 
bridge between the initial and final ensemble, it is essen¬ 
tial to control the “distance” or overlap between succes¬ 
sive ensembles pk{x) and pk+i{x). We use the Kullback- 
Leibler (KL) divergence [16] or relative entropy 

D{p\\q) = Jp{x) \n[p{x)/q{x)] dx (9) 

for this purpose. The relative entropy satisfies the Gibbs 
inequality D(p\\q) > 0 with equality only if p and q 
are identical. Therefore the Kullback-Leibler divergence 
qualifies as an “entropic distance” between ensembles p 
and q. However in contrast to a true distance the KL di¬ 
vergence is not symmetric under interexchange of p and 
q. It is only well-defined if q is “broader” than p, i.e. if 
the support of p is contained in the support of q, and 
therefore a directed divergence. In information theory, 
the KL divergence is used to quantify information gain. 

Let us now consider the relative entropy between two 
members pk and pi of the family of bridging distributions 
[Eq. (^]. With the help of the DOS we can reduce the 
high-dimensional configurational integral [Eq. ([^] to a 
one-dimensional integral over the energies: 

DiPk\\pi)= J Pfe(E)ln|^^^^^|d£; 

= (ln(gfe/gi))fc - ln(cfc/ci) (10) 

where {■)k denotes an average over the fc-th ensemble pk. 
Eor the canonical ensemble [Eq. (§] the relative entropy 
reduces to 


as a measure to compare ensembles. The Jensen-Shannon 
divergence HZ], a symmetrized version of the relative 
entropy, has been used in thermodynamic control [18j . 
The Hellinger distance [19] is a widespread distance used 
mainly in statistics and may also provide a useful mea¬ 
sure for comparing ensembles. In this article, however, 
we have not explored measures for comparing ensembles 
other than the relative entropy. 

Given a continuous bridging family, the optimal an¬ 
nealing protocol would involve infinitely many steps (adi¬ 
abatic annealing). We want to reach the target ensemble 
in finitely many steps but produce intermediate ensem¬ 
bles that have a fixed and Hnite relative entropy D. We 
will later see that for small D this amounts to cooling 
with constant thermodynamic speed. As we move from 
ensemble pk to the next ensemble Pk+i we need to eval¬ 
uate their relative entropy D{pk+i\\Pk)- Equation (10) 
shows that this involves the computation of ensemble 
averages as well as the estimation of free energy differ¬ 
ences. These are challenging computational problems, 
which can be solved by the methods outlined in the next 
subsection. 


C. Estimation of the relative entropy 

Because the relative entropy [Eq. both involves 

the normalization constants Cfc, Q as well as an ensem¬ 
ble average, it is computationally challenging to evaluate 
D(jpk\\pi) accurately. However, if we know the density 
of states g{E), the conhgurational integrals can be re¬ 
duced to low-dimensional integrals. Therefore, ensemble 
annealing estimates g{E) during the course of the simu¬ 
lation, similar to the Wang-Landau method [S] or nested 
sampling [10]. The estimation of the DOS relies on his¬ 
togram methods [iniiii]- 

If we work with N non-interacting walkers at the k- 
th iteration, the configurations are denoted by Xkn (i-e. 
the first index indicates the ensemble, whereas the sec¬ 
ond index enumerates the walkers). At each ensemble 
annealing iteration fc, a non-parametric estimate of the 
DOS 


D{pk\\pi) = (A - fik){E)p, + PkF{Pk) - PiEifii) (11) 

where F(/3) = —j3~^ logc(/3) is the free energy at inverse 
temperature 13. 

Throughout this article, we will use the relative en¬ 
tropy to measure the distance between ensembles pk and 
pi. Other measures that quantify the overlap between 


N k-1 

g‘^^\E) = Y,Y.9kl5^E-Ek,^) (13) 

n—1 k'—O 

is updated where Ekn = E{xkn) are the energies of the 
visited conhgurations. The discrete DOS g\^ assigns a 
weight to every conhguration xin that has been generated 
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by the walkers during the entire simulation up to the cur¬ 
rent ensemble pk- That is, the vector of all weights ex¬ 
pands in each iteration and is constantly updated (which 
is indicated by the superscript). 

With the help of the estimated DOS it is straightfor¬ 
ward to compute the relative entropy between two en¬ 
sembles Pk and Pi: 


k-l N 


D{pk\\Pl) 


9k>nQk{Ek'n) <lk{Ek'n) 
^ ^ Ck qi{Ek'n) 


k'—O n—1 


where 


ln(cfe/c;) 

(14) 


k-l N 

QkiEk'n)- (15) 

k'—O n—1 


These relations are used in histogram methods for esti¬ 
mating free energy differences miiM]. The weights are 
obtained using the histogram iterations 


(k) 

9k'n 


oc 


k-l 

E 

i=0 


qi{Ek'n)/ci 


(16) 


ik') 

in which each update of the weights is followed by 
their normalization and a re-evaluation of the partition 
functions Ck' according to Eq. ( [T^ . We start the itera¬ 
tion from the previous DOS estimate (setting the weights 
of the new states Xkn to zero), which speeds up the con¬ 
vergence of the histogram iterations. 


D. Initialization and equilibration of ensembles 


The estimated DOS serves two purposes: First, to es¬ 
timate the relative entropy between two ensembles reli¬ 
ably; second, to initialize the walkers to sample the next 
ensemble by recycling configurations that have been gen¬ 
erated previously, which are then equilibrated in the new 
ensemble. In the A:-th ensemble annealing iteration, en¬ 
semble pk{x) is approximated by 


Pk{x) ~ — E E 9k^iqk{Ek'n)S{x - Xk'n)- (17) 

fc'=0n=l 

We use this approximation to generate N initial states for 
the walkers by the following scheme: First, we draw an 
energy level according to the probability g^^l^qkiEk'n)/Ck- 
Second, we randomly pick one among all configurations 
that map to the energy drawn in the first step. In con¬ 
tinuous systems, it is very unlikely that two configura¬ 
tions were generated that have exactly matching energies. 
However, in discrete systems such as the two-dimensional 
Ising model there are only finitely many energy levels. 
In this case, we can speed up the DOS estimation [Eqs. 
(151 and (16)] by working with histograms as explained in 
m- Due to the limitation of the approximation ( [l7| ), the 
N recycled states need to be equilibrated in the correct 
ensemble [Eq. 0 ] Pkix) oc T:{x)qk{E{x)) using Monte 
Carlo or molecular dynamics simulations. 


k-l N 


E. Algorithm 


We have now all tools at hand to formulate the en¬ 
semble annealing algorithm. Ensemble annealing is an 
adaptive sequential Monte Carlo algorithm. The main 
parameters are the number of walkers N and the relative 
entropy D between successive ensembles pk and Pk+i- 
Choosing ensembles with a constant relative entropy en¬ 
sures that the annealing process proceeds at a constant 
thermodynamic speed. Iteration k comprises the follow¬ 
ing steps: 


(i) 


Initialization: Usin g th e current estimate of the 
DOS g^^\E) [Eq. (13)], the particles are initial¬ 
ized by drawing N energies from qk{E) [Eqs. 
(|^ and ( [T^ ] and finding the corresponding config¬ 
urations by a simple lookup in the energy table 


such that E(x'j,„) = Because is only an 

approximation, the initial states will not be equili¬ 
brated. 


(ii) Equilibration: The states are equilibrated in the 
new ensemble pk{x) by running Monte Carlo or 
molecular dynamics simulations starting from x'f.^ 
and producing new states Xkn- The new configura¬ 
tions and energies Ekn are added to the pool of all 
states visited so far. 


(iii) DOS estimation: A new estimate of the DOS, 
is computed from all energies and temper¬ 
atures using non-parametric histogram reweighting 
|21) . To speed up the convergence, the previous 
DOS estimate is used to initialize the iterations. 


(iv) Annealing: The next ensemble is adjusted such 
that it has a desired relative entropy D with re¬ 
spect to the current ensemble pk, i.e. Pk+i satisfies 
D{pk+i\\pk) = D. 


The algorithm has only few parameters, namely the ini¬ 
tial and final ensemble, the number of walkers N and the 
target relative entropy D between successive ensembles. 
Evidently, D determines the cooling or compression rate. 
For smaller D the overlap between successive ensembles 
is larger and the annealing progresses more slowly. If we 
allow D to be large, we anneal faster but risk to fail to 
equilibrate. 


F. Application to the harmonic oscillator 

Let us illustrate ensemble annealing for a simple sys¬ 
tem, the one-dimensional harmonic oscillator with energy 
E{x) = {x — xo)^/2 and ground state Xq in the canonical 
ensemble: 

Pk{x) = p{x\l3k) = Y^exp - a^o)^| ■ 
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ensemble annealing iteration k euerg)/ E{x) 

FIG. 1. (Color online) Ensemble annealing of a one-dimensional particle in a Schwefel potential E(x) = —x sin(y^[!^) (shown 
on the right) using 100 walkers. Every stripe marked by a dashed boundary shows the configurations of the walkers after 
the equilibration step. A random number has been added to the x-coordinates (iteration index) for better visualization. All 
Particles within each stripe experience the same temperature during equilibration. 


The distance between two ensembles at inverse tempera¬ 
tures P and /3', /3' > /?, is: 

D{P'\\P) = \[{P/P')-1-HP/P')] (18) 

In this case the KL divergence depends only on the ra¬ 
tio between two successive temperatures. The constant 
relative entropy criterion yields a constant cooling rate 
p = /3//3',0 < p < 1 which is determined by 

p — \TLp = 2D + l. (19) 

This results in the geometric schedule Pk = Po P~^ ■ For 
—)■ 0 we reach the adiabatic limit of infinitely slow 
cooling since p —>■ 1. Geometric schedules have been 
proposed for optimal simulated annealing [S]. 

Alternatively, we could consider the ground state a 
control parameter, P = a;o, and let 7r(a:) oc exp{—a;^/2}, 
E(x) = —X with \nZ{P) = (3^/2 and {E) = —/3. The 
relative entropy is now according to Eq. 

D{P'\\P) = + Y - (/5 - = 2 

Constant steps in the relative entropy lead to a schedule 
that is linear in the inverse temperature: Pk = Po E 
\/2D k. That is, the ground state is shifted either in 
the positive or the negative direction depending on the 
targeted ground state. 

These examples highlight that it is not sufficient to pre¬ 
scribe the relative entropy to choose the next ensemble. 
We must also impose some sense of directionality in order 
to shift the ensemble closer to the target ensemble. This 
will become particularly important in multi-dimensional 
annealing. 


III. CANONICAL ENSEMBLE 

We will now apply annealing of the canonical ensem¬ 
ble to various systems including discrete systems such as 
Ising and Potts models and a continuous protein model. 


A. One-dimensional example 

To illustrate ensemble annealing, we first apply it to a 
system with a one-dimensional configuration space and 
a rugged energy function E(x) = — a: sin(-\/|ajj), the one¬ 
dimensional Schwefel function. We deliberately choose a 
large number of walkers for illustrative purposes {N = 
100); a smaller number of walkers would be sufficient in 
this one-dimensional example. At every iteration, equi¬ 
libration is achieved by using a random walk Metropolis 
Monte Carlo scheme [55] consisting of 10 random steps 
drawn from a uniform distribution. The relative entropy 
is set to D = 10“^. Figurej^shows the configurations at 
the various temperatures obtained by the constant rela¬ 
tive entropy criterion. We start at initial = 0 and target 
a final inverse temperature Ynai = 1- As ensemble an¬ 
nealing progresses the walkers become more and more 
localized in the dominant modes of the target ensemble. 
The relative proportions are reproduced accurately. 

This example also suggests that it should be possible 
to prune the number of the walkers during the anneal¬ 
ing process. In the course of annealing, the ensemble 
becomes more and more concentrated (as monitored by 
a decrease in the entropy Sk = —/pfclnpfc), and fewer 
walkers are needed to explore and represent it. Using the 
Boltzmann relation S = k log W where W is the number 
of accessible microstates, we could decrease the number 
of walkers in each iteration and thereby save computa¬ 
tional resources. However, we have not explored this 
strategy further in this article. 
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B. Ising and Potts model 

We now apply ensemble annealing to the two- 
dimensional Ising and Potts model. Figurej^shows simu¬ 
lation results for the 32 x 32 lattice. N = 10 particles were 
used and the relative entropy was set to D = 10“^. The 
equilibration step consisted of Metropolis Monte Carlo 
runs that randomly select lattice sites and try to flip the 
spin (Ising model) or draw a random color (Potts model). 
Figure [^a) shows how the algorithm improves the initial 
DOS estimate. The algorithm starts with N random spin 
configurations from which the initial DOS covering only 
a limited energy range is derived. As the algorithm pro¬ 
ceeds, lower energy states are generated and the DOS 
expands into the lower energy region. This process con¬ 
tinues until the full energy range has been explored and 
a highly accurate estimate of the DOS is produced. The 
accuracy of the estimated DOS is illustrated in Fig. [^b). 
The estimation error can be as small as with WL sam¬ 
pling [5]. A similar accuracy is also obtained for the ten 
state Potts model which undergoes a first order phase 
transition. The DOS tends to be more accurate for the 
low energy states. In most situations this is desirable be¬ 
cause one is primarily interested in the thermodynamic 
properties of the system at finite temperatures, at which 
the low energy states contribute most strongly. 


C. Protein model 

Ensemble annealing readily applies to continuous sys¬ 
tems such as Go models that have been used extensively 
to study protein folding (see e.g. [IS]). In our version 
of the Go model, the dihedral angles are the only con¬ 
formational degrees of freedom; bond lengths and angles 
are fixed to ideal values. The energy function is com¬ 
prised of a generic non-bonded energy potential and the 
Go term. The non-bonded energy penalizes atom clashes 
using the same quartic repulsion term as in Ref. m- 
The Go term enforces the native structure by imposing 
a Lennard-Jones potential on the Ga distances between 
residues in contact in the native state. The inverse tem¬ 
perature P serves as the control parameter in ensemble 
annealing runs. As in Ref. [T3|, we used hybrid Monte 
Carlo [55| for equilibration. We seeded the simulation 
with 100 random structures and annealed an ensemble 
of iV = 30 structures; the relative entropy was set to 
D = 10“^. For reference, we also ran a parallel temper¬ 
ing simulation of the Go model using 37 temperatures. 
The DOS obtained with ensemble annealing was used to 
optimize the temperatures to produce an exchange rate of 
48% on average. 10000 replica transitions were simulated 
and an estimate of the DOS was obtained by running his¬ 
togram reweighting. 

We studied the Go model derived for a small protein 
domain, the 59 amino-acid Fyn-SH3 domain (PDB code 
ISHF). Figure [^a) shows the density of states obtained 
with ensemble annealing and compares it to the reference 
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(a) Ensemble annealing of the 32 X 32 Ising model 




(b) DOS of the 32 X 32 Ising and Potts model 

FIG. 2. (Color online) (a) Improvement of the estimated DOS 
during ensemble annealing. Four equally spaced stages of en¬ 
semble annealing were picked; the iteration index k is indi¬ 
cated in the legend. The true DOS |24| is shown as a thick 
black curve; the estimated DOS as yellow [gray] area, (b) Es¬ 
timated DOS for the Ising (left) and the ten state Potts model 
(right). Again the black line shows the true DOS and the yel¬ 
low [gray] line the DOS produced during ensemble annealing. 
The insets show the relative error in the microcanonical en¬ 
tropy s{E) = lng{E). 
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FIG. 3. (Color online) Ensemble annealing of the Go model. 
Left panel: Density of states obtained with ensemble anneal¬ 
ing (yellow [gray] line) and a parallel tempering simulation 
(thick black line). Right panel: Fraction of native contacts 
{Q)p as a function of the inverse temperature /? (black area). 
The heat capacity is indicated by the yellow [gray] line and 
has been scaled to match the ordinate range. It peaks at the 
folding temperature p « 1.3. The ribbon diagrams show con¬ 
figurations in the unfolded state (black ribbon on the left) 
and in the folded state (white ribbon on the right). 
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(a) Energy histograms 




(b) Temperature schedule and PT swap rates 


FIG. 4. (Color online) Ensemble annealing of ten state Potts 
model, (a) Energy histograms at the temperatnres found by 
ensemble annealing. Only every twentieth histogram is shown 
for clarity. The temperature is indicated by the color, (b) 
Left: Temperature schedule. The thin yellow [gray] line in¬ 
dicates the schedule found by ensemble annealing. The thick 
black line corresponds to the protocol based on minimum en¬ 
tropy production |27| . The dashed gray line indicates the 
critical temperature. Right: PT swap rates obtained with 
the temperature schedule. The heat capacity is shown as yel¬ 
low [gray] curve. 


computed with an exhaustive parallel tempering simula¬ 
tion. The agreement is very high over the entire energy 
range. In Figure |^b) we study the characteristics of the 
Go model as revealed by ensemble annealing. Shown is 
the average number of native contacts Q as a function of 
the inverse temperature. The folding transition is marked 
by a sudden increase in the number of native contacts. 
The heat capacity peaks at /3 = 1.3 indicating a folding 
temperature of roughly 0.77. 


IV. SCHEDULES AND PATHS 


We will now have a closer look at the schedules con¬ 
structed by ensemble annealing and compare them to 
other schedules that have been proposed in the literature. 
Moreover, we discuss the possibility to use ensemble an¬ 
nealing as a numerical method to construct near-optimal 
thermodynamic paths. 


A. Temperature schedule 

Figure shows the energy histograms and tempera¬ 
ture schedule found by ensemble annealing for the ten 
state Potts model. By way of construction of the sched¬ 
ule, the energy histograms of successive ensembles have 
a constant overlap (Fig. |^a)). The temperature sched¬ 
ule is non-trivial and deviates from the linear, geomet¬ 
ric, and logarithmic schedules that have been proposed 
in the literature min]. Initially, the inverse tempera¬ 
tures grow sublinearly. In this phase, the schedule con¬ 
structed by ensemble annealing is reminiscent of the log¬ 
arithmic schedule proposed by Geman and Geman [28) . 
i.e. /3fe = /3oln(l-l-fc). As ensemble annealing approaches 
the critical temperature, the cooling rate it slowed down 
automatically such that the system is not quenched and 
avoids being trapped in a metastable state. Beyond the 
critical point, the temperatures show a super-exponential 
increase (Fig. [^b)). 

Salamon and co-workers have proposed an adaptive 
version of simulated annealing more than two decades ago 
[271 [29]. Their algorithm finds the temperature sched¬ 
ule by minimizing the entropy production whereupon the 
temperature changes inversely proportional to the square 
root of the heat capacity. This rule follows directly from 
the constant relative entropy criterion. For small changes 
in inverse temperature, we have 

D(/3 + (5/311/3) « ^(5/3)25^ In c(^) (20) 

where 

djlnciP) = {{E - {E)f)p = C{/3)/l3^ 

is proportional to the heat capacity If the desired 

relative entropy D is small, the increment in inverse tem¬ 
perature is 


(5/3 = I3^2D/C{p). 

Integration over the inverse temperature increments 

/ k 

(5/3«^/3,y2^ (21) 

1=1 

generates a schedule that is very close to the one found 
by ensemble annealing at finite D (Fig. |^b)). Compar¬ 
ison with the schedule derived by Salamon et al. shows 
that \/1J is proportional to the thermodynamic speed of 
the annealing process. In the context of Bayesian com¬ 
putation, similar, but independent arguments have been 
put forward by Skilling [10] who uses D to control the 
rate of compression as the system moves from the prior 
to the posterior probability. 

From a practical point of view, an ensemble annealing 
run can be used to seed a parallel tempering simulation 
that has a well-balanced schedule and equilibrated ini¬ 
tial states. The right panel in Fig. |^b) illustrates that 















the exchange rates are indeed uniform for a PT simula¬ 
tion when using every fifth temperature of the ensemble 
annealing schedule. A drop in the swap rate is only ob¬ 
served close to the critical temperature where the heat 
capacity peaks. 


B. Minimal dissipation paths 


Let us now see if the results of the previous section 
generalize to multiple temperatures. Although ensemble 
annealing can be applied to any family of bridging dis¬ 
tributions [Eq. Q], let us focus on parametric families 
of the form 

P(a^|A) = q[E{x)-,X] tt{x) 

c{A} 

where A denotes the vector of all control parameters. The 
second order expansion of the relative entropy is [18] : 

^(A'||A)« ^(A'-Af/(A)(A'-A) (22) 

where the zero and first order term vanish because 
Z? (A11 A) = 0 and A' = A is the global minimum of Z? (A' 11 A) 
viewed as a function of X'. Because the Fisher informa¬ 
tion matrix 

/(A) = y[VAlnp(a;|A)][VAlnp(a;|A)]^p(x|A)da; (23) 
is positive definite, it defines a metric on the space of 


distributions parameterized by A. Equation (20) is a spe¬ 


cial case of the general relation (22) for the Boltzmann 


ensemble with a single temperature, where the Fisher in¬ 
formation is simply lnc(/3). 

In statistics, the Fisher metric has been studied since 
the beginnings of information geometry. The Fisher in¬ 
formation can also be used to define a thermodynamic 
length and action (see m and references therein). Qua¬ 
sistatic processes that switch between two thermody¬ 
namic states follow minimal dissipation paths in A space. 
These can be computed by minimizing the thermody¬ 
namic length (see, for example, [1811301 [31] ). Therefore, 
the optimal path is a geodesic on the Riemanian mani¬ 
fold equipped with the Fisher information metric. Very 
similar results have been presented by Gelman and Meng 
in their work on bridge and path sampling m- 

By taking constant but finite steps in relative entropy 
followed by an equilibration, ensemble annealing approxi¬ 
mates a quasistatic process. After K successful equilibra¬ 
tions, the relative entropy accumulated during ensemble 
annealing is 


KD 


K-l 

Y, D{Xk+i\\Xk) ^ - / A^/(A)Adt 

k^O 


(24) 


and approximates the thermodynamic action due to Eq. 
(22). If we aim to optimize the use of computing re¬ 


sources, we have to minimize if, the number of bridg¬ 
ing distributions. For a single control parameter this is 


straightforward: we have to follow the geodesic towards 
the destination ensemble. In the canonical ensemble, for 
example, if the destination temperature is lower than the 
initial temperature (annealing), we have to increase /3 
such that /3fc+i > Pk also for all intermediate temper¬ 
atures. For ensembles with multiple control parameters 
the situation is more complicated because minimizing the 
accumulated relative entropy [Eq. (24)] requires the com¬ 
putation of a discrete geodesic. However the DOS is gen¬ 
erally unknown, and we can compute i(A) only in the 
vicinity of the current state. It is not possible to evalu¬ 
ate reliably the length of an entire path connecting the 
initial and the destination ensemble. We can only search 
locally without any guarantee that the generated path is 
close to the geodesic. 


V. NON-BOLTZMANN ENSEMBLES 

A. Tsallis ensemble 


A major advantage of using the Boltzmann distribu¬ 
tion ([^ as bridging family is that many powerful meth¬ 
ods to simulate the canonical ensemble exist. We can use 
these algorithms in the equilibration step. But it can be 
beneficial to consider also other ensembles, because they 
might bridge more efficiently between the initial and final 
ensemble. The Tsallis ensemble has been used previously 
in combination with parallel tempering mils]. The mo¬ 
tivation for this choice is that due to the heavier tails of 
the Tsallis ensemble replicas have a larger overlap and 
can exchange states even if they show large energy dif¬ 
ferences. As a consequence, the number of intermediate 
replicas should be smaller than with the Boltzmann en¬ 
semble. 

This is indeed confirmed by an analysis of the 32 x 32 
Ising model. Test calculations based on the correct DOS 
show that the canonical ensemble requires 273 Pk to reach 
the destination ensemble (/3finai = 1) at a relative entropy 
oi D = 10“^, whereas the Tsallis ensemble needs only 85 
Uk values to bridge between Ogtart = 1-06 (corresponding 
to a very high canonical temperature) and agnai = 1-0. 
However, in practice this apparent advantage does not 
hold up. The reason is that the Tsallis ensemble typi¬ 
cally yields multimodal energy distributions at interme¬ 
diate Ofc. To see this let us first consider the more general 
case where a parametric bridging family q{E] A) is used. 
According to Eq. (|^ the energy distribution at A is pro¬ 
portional to g{E)q{E\X) and peaks at E solving: 


0 = s'{E) + 


q'{E;X) 

q{E-X) 


P{E) + 


q'{E-X) 

q{E-X) 


where s{E) = \ng{E) and P{E) = s'{E) are the micro- 
canonical entropy and inverse temperature. In case of the 
canonical ensemble, this equation is simply P{E) = P, 
that is the energy distribution peaks at the energy E 
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FIG. 5. (color online) Annealing of the 32 x 32 Ising model 
in the Tsallis ensemble, (a) Probing different Em\n at fixed 
a = 1.0025. The thick black line shows the microcanonical 
temperature P{E). The dashed lines show the right hand 
side of Eq. (251 for different i?min. The curve produced 
with Emin set to the ground state energy Eq intersects f3{E) 
twice corresponding to two peaks in the energy distribution 
g{E)q[E\a). (b) Comparison of ensemble annealing of the 
Tsallis ensemble with Emin = Eq = —2L^ (dashed red line) 
and Emin = 2Eo = —4E^ (yellow solid line). The true DOS 
is shown as thick black line. 


whose microcanonical temperature matches the canoni¬ 
cal temperature. In case of the Tsallis ensemble ([^, we 
have: 


m) 


_ p _ 

l + /?(a-l)(E-Emin) 


(25) 


This equation can have multiple solutions depending on 
a and Eminj which is why it is difficult to get anneal¬ 
ing of the Tsallis ensemble running in a stable fashion. 
Not only the control parameter a, but also the mini¬ 
mum energy Emin plays a critical role (see Fig. [^. If 
^'min is exactly set to the energy of the ground state Eg, 
the energy distribution of the Ising model becomes bi- 


modal with a sharp peak around the ground state energy 
and a second peak corresponding to high temperatures. 
That is, in order to generate samples from this ensem¬ 
ble we have to simulate two phases simultaneously. As 
a consequence the DOS estimate produced by ensemble 
annealing shows systematic errors (Fig. [^b)), despite 
producing an efficient schedule with 103 bridging distri¬ 
butions. If we lower Emin, the phase separation is less 
dramatic and consequently the DOS estimate is as accu¬ 
rate as with the Boltzmann ensemble. But we also lose 
the efficiency of the Tsallis ensemble in bridging large en¬ 
ergy differences, which is reflected in the larger number 
of Qffc: 230 ak for Emin = 2Eo which is similar to the 270 
temperatures produced by Boltzmann annealing. This 
shows that Emin is an additional algorithmic parameter 
which is delicate to choose. 


B. Microcanonical ensemble and nested sampling 

Nested sampling has been invented by Skilling m to 
solve Bayesian inference problems. Bayesian inference 
demands that we draw from a posterior distribution p{x) 
and compute its normalization constant, which are es¬ 
sentially the tasks that ensemble annealing addresses. In 
Bayesian inference Tr{x) is the prior, L{x) = the 

likelihood function; the destination ensemble that we aim 
to characterize is the posterior distribution over some in¬ 
ference parameter(s) x. 

Nested sampling is based on the microcanonical ensem¬ 
ble (/(E; e) = 0(e — E) [Eq. if]; the control parameter 
e is the maximum energy that the system is allowed to 
reach [52] • Therefore nested sampling can be viewed as 
a special case of ensemble annealing based on a zero- 
temperature Fermi or the microcanonical ensemble. The 
relative entropy between two ensembles [Eq. with 

energy levels e' < e simplifies to: 

E(e'||e)=ln[c(e)/c(eO] (26) 

where the normalization constant 

c(e)= f 5 (E) dE (27) 

-Emin 

is the cumulative DOS or configuration space volume. 
Erom a Bayesian point of view, c(e) is the prior mass 
enclosed by the likelihood contour L = e~'^. The control 
parameter is reduced from infinity to the energy of the 
ground state Emin- 

There are several differences between nested sampling 
and annealing of the ensemble ([^ using e as control pa¬ 
rameter. These differences result from the fact that all of 
the features that ensemble annealing aims to implement 
explicitly are built-in to nested sampling. In fact, nested 
sampling’s design principles served as a guide to develop 
the ensemble annealing algorithm. 

Ensemble annealing uses histogram methods to esti¬ 
mate the DOS, whereas nested sampling utilizes order 
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FIG. 6. (Color online) Energy contours tk found by nested 
sampling (thick black line) and ensemble annealing (yellow 
[gray] line) for the 16 x 16 Ising model. 


statistics due to the special form the of truncated ensem¬ 
ble ([^. As a consequence of the truncation, c{E) will be 
uniformly distributed over pk{E) (defined for E < e^), 
which is clear from Eq. ([^ [33]. Therefore the conhg- 
uration space volume associated with the maximum en¬ 
ergy state follows the distribution c(ifniax)/c(e) ^ Nt^~^ 
where 0 < < < 1 and Amax < e is the maximum en¬ 
ergy among all N walkers. Based on this result from 
order statistics, nested sampling estimates c{E) at well- 
dispersed energy cutoffs e^. 

Another elegant feature of nested sampling is that 
if H = 1/A^) the next ensemble achieving a compres¬ 
sion of D is the one in which the energy is bounded by 
Amax- This results from the fact that (Il(ifniax||e)) = 
— (logt) = \/N where the average is over the Beta distri¬ 
bution Nt^~^. Therefore the search for the next control 
parameter will simply yield Ck+i = Amax, and we only 
have to resample the state with the highest energy. 

Although nested sampling is much more efficient at 
cooling the truncated ensemble ([^, it is also possible to 
run the ensemble annealing algorithm. Both methods 
produce comparable sequences of energy levels for the 
16 X 16 Ising model with A^ = 10 and D = \/N = 0.1 (see 
Fig. §. Also the estimated DOS is of similar accuracy. 
For this example, nested sampling runs at a speed that is 
three orders of magnitude faster than ensemble anneal¬ 
ing. This is due to the fact that DOS estimation and 
annealing (i.e. the choice of the next energy limit) are 
instantaneous in nested sampling, because they are built- 
in to the method. Ensemble annealing, on the contrary, 
needs to run the histogram iterations for every energy 
contour. The histogram iterations converge only very 
slowly. Each iteration is dominated by DOS estimation 
because equilibration of the Ising model is very fast. For 


other systems such as proteins it will be the equilibration 
step rather than DOS estimation that consumes most of 
the computation time. In this situation, the discrepancy 
between nested sampling and ensemble annealing will not 
be as dramatic as for the Ising model. 

For the d-dimensional harmonic oscillator we have 
g{E) = and c{E) = As in the canon¬ 

ical ensemble, the relative entropy depends on the ra¬ 
tio of two successive control parameters: D{e\\e') = 
d/21n(e'/e). Therefore nested sampling and ensemble 
annealing progress geometrically according to Cfc = cq 
where p = Let us compare this to the ther¬ 

mal approach using the inverse temperature as a control 
parameter. The compression rate of the canonical dis¬ 
tribution is given by p — \np = 2D/d -I- I [Eq. (19)]. 
Therefore pp — Inpp = — Inp^ -|- 1 where pp and p^ are 
the compression rates of thermal and microcanonical an¬ 
nealing. Rewritten we have \n.{pp/p^) = P /3 — 1 < 0, 
and therefore pp < Pe This means that annealing 

the canonical ensemble compresses faster than annealing 
the microcanonical ensemble. We observe this for the 
application to the Ising model (Fig. [^. Canonical an¬ 
nealing with A^ = 10 walkers and a relative entropy of 
D = 0.1 requires only 42 iterations to reach the desti¬ 
nation ensemble. Microcanonical annealing and nested 
sampling, on the contrary, need approximately 1800 it¬ 
erations until convergence. The reason for this is that 
states accumulate at the maximum energy e, and there¬ 
fore nested sampling and microcanonical annealing will 
produce many intermediate ensembles in order to bridge 
between the initial and the destination ensemble. 


VI. CONCLUSION 

Ensemble annealing is a Monte Carlo algorithm that 
steps through a sequence of ensembles and generates con¬ 
formational samples. Along with the samples, it also es¬ 
timates the density of states using histogram methods. 
The ensembles are placed in an adaptive manner so as 
to maintain a constant, pre-chosen relative entropy be¬ 
tween successive ensembles. Ensemble annealing can be 
applied to a variety of bridging distributions, foremost 
the canonical ensemble but also to non-Boltzmann fam¬ 
ilies such as the Tsallis or the microcanonical ensemble. 
There is a close connection to the nested sampling algo¬ 
rithm. In fact, ensemble annealing aims to implement 
the features that are built-in to nested sampling: control 
of the compression or thermodynamic speed, as well as 
reliable estimation of the compression based on the DOS 
or the configuration space volume. Nested sampling is 
intimately tied to the truncated ensemble ([^, whereas 
ensemble annealing is more general in the choice of the 
ensemble itself, which can help to speed up the simula¬ 
tion and allows the use of samplers that work efficiently 
with a particular ensemble (such as, for example, hybrid 
Monte Carlo in the canonical ensemble). 

Ensemble annealing is also related to previous work by 
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Salamon and co-workers nziiin] on simulated annealing. 
Our approach is more general and gives richer results 
because it not only finds the system’s ground state but 
reconstructs the entire DOS. That way ensemble anneal¬ 
ing can be used to both simulate thermodynamic systems 
and solve difficult optimization problems. By means of 
the DOS, all visited states contribute to the computation 
of ensemble averages making our approach more robust. 


Moreover, it is possible to work with multiple control pa¬ 
rameters, which will be studied in future extensions of 
ensemble annealing. 
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